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STOCHASTIC  MODELS  FOR  THE  INTERPRETATION  OF 
METEOROLOGICAL  DATA 
By 

F.  N.  David 

1 .  The  Statement  of  Work  to  be  accomplished  under  the  contract 
was  set  out  as  follows: 

Item  1.  Discover  and  describe  one  or  more  chance  mechanisms 
or  stochastic  processes  for  the  behavior  of  atmospheric  variables, 
and  similar  random  variables,  which  would  lead  to  the  observed 
serial  and  spatial  correlations  of  winds,  temperatures,  pressures, 
precipitation,  etc. 

Item  2.  Develop  optimum  procedures  for  estimating  the  param¬ 
eters  of  the  mechanisms,  and  for  determining  their  significance. 

2.  Work  began  on  the  contract  on  June  1,  1961.  At  the  behest 
of  Dr.  Arnold  Court  I  visited,  with  an  introduction  from  him, 

Mr.  C.  S.  Durst,  then  recently  retired  from  the  Meteorological 
Office.  Mr.  Durst  was  widely  known  for  his  work  on  winds  and  had 
published  a  great  many  important  papers  on  the  topic.*  He  was 
generous  of  his  time  and  in  addition  to  making  known  to  me  all  the 
available  literature  he  told  me  in  detail,  during  the  first  and 
subsequent  visits,  such  ideas  as  he  had  for  further  research  on 
these  problems.  That  I  did  not  take  advantage  of  his  generosity 
was  solely  because  my  own  research  developed  along  different  lines. 

3.  In  mid-June,  having  completed  the  gathering  of  information 
in  London  as  far  as  time  allowed--I  was  not  able  to  take  advantage 
of  an  invitation  from  the  Director  to  visit  the  Royal  Meteorological 


It  had  been  my  hope  to  collaborate  with  him  on  my  return  to 
England  but  he  died  in  December,  1961. 
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Office--I  visited  with  Dr.  Arnold  Court  then  chief  of  the  Applied 
Climatology  Branch  in  Waltham,  Massachusetts.  We  spent  one  whole 
working  day  together  during  which  he  reinterpreted  the  Statement  of 
Work  and  asked  for  an  empirical  interpretation  of  spatial  correla¬ 
tions  between  pressure  measurements.  This,  then,  was  the  first 
problem  which  I  tackled.  [Report  AFCRL-62-461(I) . ] 

4.  Pressure  as  a  Function  of  Distance  I.  [ AFCRL-62-46l(I) ] 
The  raw  material  of  the  empirical  investigation  consisted  of  the 
daily  pressures  at  stations  in  the  United  States  and  Canada  for  the 
three  months  January,  February,  and  March  of  the  years  1949-1953. 
The  heights  at  which  the  pressures  were  500  m.b.  were  found  and  the 
correlation  coefficient  between  these  heights,  taking  each  station 
in  turn  with  every  other  station,  were  calculated.  Enough  stations 
were  taken  to  enable  contour  lines  to  be  drawn.  An  empirical 
surface  to  describe  the  space-correlation  surface  thus  formed  was 
requested. 

Using  parallels  of  latitude  as  delineating  possible  cross 
sections  of  the  surface  the  functional  form  of  such  cross  sections 
was  investigated.  The  various  functional  forms  which  were  tried 
and  the  variety  of  methods  of  fitting  them  are  described  in  the 
report,  with  the  manipulative  mathematics  necessary.  It  is  enough 
here  to  note  that  one  was  driven  to  the  inescapable  conclusion  that 
the  functional  form  must  consist  of  a  sum  of  damped  harmonics  and 
that  the  width  of  the  United  States  was  not  sufficient  to  allow 
estimation  of  the  parameters  involved.  Thus  if  a  station  X  is 
chosen  as  origin,  and  the  correlation  of  500  m.b.  heights  r  . 

X,  X 

with  a  station  (i=l,2,’**)  on  the  same  latitude  and  a  distance 
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from  X  is  calculated,  then  an  estimate  of  this  correlation  may 
be  obtained  from 


(1) 


rx,i 


p  -a  «pCd±> 
Z  e  J 

J-l 


Aj  cos(&jdi  +  7j) 


where  a  j ,  Aj ,  p j ,  7j  are  constants  to  be  determined  and  cp(d^) 
is  equal  to  d^  or  d^.  It  is  clearly  not  enough  to  take  p  =  1 
but  how  large  p  should  be  it  is  not  possible  to  say,  neither  is 
it  possible  to  determine  rhat  cp  should  b  :. 

Given  that  one  may  eliminate  by  trial  and  error  the  common 
statistical  functions  used  to  describe  correlation  data  it  was  found 
that  no  estimates  of  the  damped  harmonics  parameters  are  possible 
since  the  distance  between  the  two  stations  farthest  apart  was  not 
large  enough  to  cover  a  complete  period.  The  general  pattern  of  the 
beginning  of  the  space  correlation  data  was  very  like  the  pattern 
of  the  serial  time  correlations  investigated  by  Gilbert  Walker — 
like  enough  to  enable  one  to  be  reasonably  sure  that  (1)  was 
appropriate --but  his  correlations  oscillated  about  -0.2  whereas 
it  was  not  possible  to  see  what  the  space  correlations  did. 

Early  in  the  investigation  it  became  clear  that  the  empirical 
approach  involving  choosing  an  arbitrary  functional  form  to  graduate 
the  space -correlograra  was  inappropriate  for  the  problem.  It  was  a 
good  idea  but  the  data  are  such  that  it  did  not  work  out.  Later  in 
the  year  a  new  approach  was  adopted. 

5.  Pressure  as  a  Function  of  Distance:  II.  [ AFCRL-62-1012] 

The  problem  of  relating  the  correlation  of  pressure  measurements  at 
two  stations  with  the  distance  between  them  was  investigated  from 
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one  aspect  in  the  first  paper.  It  is  suggested  that  a  more  fruitful 
approach  to  the  interpretation  of  such  correlations  is  by  building 
a  stochastic  model.  The  model  delineated  in  Pressure  as  a  Function 
of  Distance:  II  is  almost  certainly  too  simple  to  describe  what 
is,  after  all,  a  very  complex  state  of  affairs,  but  time  did  not 
allow  the  appropriate  generalizations  to  be  made.  Such  results  as 
are  obtained,  however,  suggest  that  this  method  of  attack  on  the 
problem  is  entirely  suitable  for  the  purposes  of  generating  a 
correlation  surface  such  as  was  described  as  being  required  in 
Pressure  as  a  Function  of  Distance:  I.  Generalizations  both  of 
model  and  of  the  method  of  approach  to  the  problem  are  suggested 
in  a  further  section  of  this  report. 

It  is  certain  that  the  position  of  the  estimated  high  pressure 
center  nearest  the  geographical  center  of  the  United  States  can  be 
marked  on  a  map  at  a  fixed  time  each  morning  (say) .  If  we  wish  to 
study,  as  Court  did,  pressures  for  the  months  of  January,  February 
and  March,  we  would  wish  to  know  the  variation  of  the  position  of 
the  high  pressure  center  for  these  three  months  for  a  number  of 
years.  We  may  calculate  the  average  of  all  these  positions  and  use 
this  as  a  center  of  coordinates  with  axes  of  reference  the  parallel 
of  latitude  and  longitude  running  through  this  center.  (It  is 
plausible  to  suggest  that  this  average  position  will  be  not  very 
different  from  the  geographical  center,  but  if  it  is  found  not  to 
be  so,  but  varies  (say)  according  to  the  season  of  the  year,  this 
could  be  allowed  for.)  It  will  be  assumed  for  the  purposes  of  this 
present  model  that  the  daily  high  pressure  center  is  distributed 
about  the  average  center  as  in  a  normal  bivariate  surface,  the 
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variability  along  the  line  of  latitude  being  6p  along  the  line 
of  longitude  being  6 2,  and  wi*-h  correlation  p.  For  any  day  6^, 
a 2  and  p  are  constants  but  in  the  over-all  picture  they  are 
regarded  as  random  variables  each  having  a  p.d.f. 

Consider  any  station  of  known  geographical  position,  and  sup¬ 
pose  that  the  pressure  recorded  at  the  fixed  tine  on  a  given  day  is 
built  up  from  a  basic  pressure,  here  regarded  as  constant  throughout, 
plus  a  constant  multiple,  c,  of  a  random  quantity.  This  quantity 
is  supposed  to  depend  on  the  distance  of  the  station  from  the  high 
pressure  center  recorded  for  the  fixed  time  on  the  given  day.  Thus 
if  (X,  Y)  are  coordinates  of  the  high  pressure  center  referred  to 
die  average  position,  and  (U,  V)  are  the  coordinates  of  the  station 
referred  to  the  same  origin  and  axes,  p  is  the  actual  pressure  and 
Pq  the  basic  pressure,  we  assume  that 

p '  p°  '  exp{'  i?t(u"x)£  +  (V-Y>2]}- 

6  will  be  treated  as  variable.  For  fixed  0^ ,  e2,  p  and  6, 

6p  can  be  averaged  over  all  possible  values  of  (X,  Y)  and  then 

2  2  2 

assuming  that  6^,  C2  and  flc  are  distributed  as  gamma  variables 
with 

e(o2)  «  b,  „  a 

and  that 

P(p)  85  $(l-p)(l+p),  -1  <  p  +1, 

the  average  of  <5p  is  obtained  allowing  for  these  variations.  The 
necessary  integrations  were  difficult  and  the  answer  was  obtained 


-  6  - 


as  a  series  expansion,  attention  being  paid  to  relative  orders  of 
magnitude.  Var(p)  may  similarly  be  deduced  and  also  the  covariance 
between  the  pressure  measurements  recorded  at  two  different  stations. 

It  was  supposed  that  a  »  ab  »  pc  where  a  and  3  are  small 
and  less  than  unity.  Further,  U  and  V  were  standardized  and  we 
write 

u#  «  JLf  v*  =  JL 

VF  Vc 

so  that 


+  V2, 


R  will  thus  be  the  actual  distance  of  the  station  from  the  average 
position  of  the  high  pressure  centers  and  R*  the  standardized 
distance.  If  we  consider  two  stations  with  coordinates  (Up  V^) 
and  (U 2>  V2)  it  was  shown,  to  a  first  order  of  approximation,  that 
we  have,  for  the  correlation  between  the  pressure  measurements  at 
two  stations, 


and  similarly  for  Wg.  So  far  b  and  c  and  therefore  a  and  3 
have  been  supposed  different.  Some  simplification  results  if  we 
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write  o  »  p  ■  6  so  that,  say, 

Rj2  -  Uj2  +  Vj2  -  (U^  +  V^)/b  -  D2;  R|2  -  D2 

.  *  *>2  „  *  *v2  p 

(U1  "  U2>  +  <V1  ’  V  -  D  • 

Thus  and  Dg  are  proportional  to  the  distances  of  the  two 

stations  from  the  origin  and  D  to  the  distance  between  them.  We 
have  then  as  our  first  approximation. 


6.  Application  of  Deduced  Formula  Correlation.  So  far  the 
model  has  been  built  and  the  consequences  deduced  without  reference 
to  the  data  of  experience.  Time  did  not  permit  a  data  check  of  the 
foundations  of  the  model.  We  proceeded  therefore  to  guess  values 
for  b  and  for  6  and  for  the  origin  of  coordinates.  We  took 
therefore,  entirely  arbitrarily,  the  origin  of  coordinates  (i.e., 
the  average  position  of  the  high  pressure  centers)  at  40°W,  100°N, 
and  for  the  sake  of  example  calculated  the  distances  of  stations  362, 
553,  431,  445,  764  and  662  from  this  center,  the  distances  being 
those  on  the  sphere.  The  distances  of  all  the  stations  from  562 
were  also  found.  Again  guessing  we  chose  a  ■  p  ■=  6  «  0.1  and 
VT>  =  18°.  From  our  formula  for  r^2  we  then  worked  out  the  corre¬ 
lations  to  be  expected  between  the  pressure  measurements  at  station 
562  and  all  the  other  stations.  These  correlations  should  not  be 
directly  comparable  with  those  calculated  by  Dr.  Cooley  from  actual 
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data  In  that  his  correlations  were,  I  believe,  based  on  average 
daily  measurements,  but  they  should  be  reasonably  in  accord  with 
hl«.  The  correlations  based  on  our  formula  and  the  ones  actually 
observed  are  shown  in  the  table  below: 

Table:  Correlations  of  Pressure  Measurements 
With  Those  of  Station  562 


Station  U  553  45I  445  764  662 


Actual  correlation 

0.81 

0.89 

0.53 

0.74 

0.86 

Formula  correlation 

0.81 

0.90 

0.50 

0.75 

O.87 

7 •  Discussion  of  Results.  The  stations  chosen  were  those  near 
562  because  it  was  anticipated  that  given  the  approximations  involved, 
and  there  were  many,  the  correlations  would  be  too  large  when  the 
distances  between  the  stations  became  large.  Again  when  the 
distances  from  the  average  position  became  large  it  is  to  be 
expected  that  the  correlations  will  be  overemphasized.  Further 
calculations  using  different  groups  of  stations  showed  this  to  be 
indeed  the  case.  Thus  while  the  results  obtained  are  somewhat 
remarkable  in  that  those  produced  by  the  model  without  reference 
to  real  data  agree  closely  with  those  actually  obtained  from  real 
data  this  can,  I  think,  be  no  more  than  an  indication  of  what  might 
be  done  by  building  further  models  of  this  kind.  The  first  thing 
which  clearly  it  is  important  to  do,  is  to  obtain  some  idea  of  the 
average  position  of  the  high  pressure  centers,  and  of  the  other 
parameters  involved  in  the  model.  Only  then  is  it  possible  to  say 


whether  the  model  is  really  serviceable  and  worthy  of  further 
development . 

8.  Further  Possible  Research.  Emphasis  has  been  laid  on  the 
fact  that  the  stochastic  model  proposed  above  is  too  simple  to 
describe  the  pressure  complex ,  and  moreover  that  the  approximations 
made  in  working  out  the  algebra  of  the  model  are  such  as  to  render 
it  only  of  rudimentary  value.  It  is  noted  immediately  above  that 
for  stations  close  to  what  is  anticipated  to  be  the  average  center 
of  the  high  pressure  system  the  model  is  useful,  but  what  is  required 
is  a  model  which  will  be  adequate  for  prediction  over  a  very  wide 
range.  Clearly  the  numerical  investigations  noted  as  necessary  in 
section  7  should  be  the  first  part  of  any  further  research.  Given 
that  some  idea  is  thus  obtained  then  the  following  are  a  few  of  the 
ways  in  which  further  mathematical  attack  appears  possible. 

(i)  A  modification  of  the  model  given  here  can  be  made  by 
introducing  more  variability  and  assuming  that  the  average  position 
of  the  high  pressure  centers  varies  about  (say)  the  geographical 
center  of  the  United  States. 

(ii)  The  constant  c  may  be  assumed  to  be  different  for  dif¬ 
ferent  stations. 

(iii)  The  Isobaric  contours  suggest  not  a  normal  bivariate 
surface,  as  was  assumed  with  the  present  model,  but  rather  a  double 
Edgeworth  surface.  (Probably  this  is  the  reason  why  the  present 
model  is  inadequate  for  large  distances.)  The  mathematical  analysis 
involved  in  introducing  a  double  Edgeworth  series  in  this  way  is  not 
difficult  although  it  is^  laborious.  As  many  parameters  as  desired 
can  be  introduced  into  the  model  by  taking  successive  terms  of  the 


series.  For  the  type  of  approach  to  the  problem  of  pressure  as  a 
function  of  distance  as  illustrated  in  AFCRL-62-1012  the  assumption 
of  the  double  Edgeworth  series  will  probably  give  as  general  a 
result  for  the  correlations  as  can  be  expected. 

(iv)  A  different  method  of  approach  to  the  pressure  measure¬ 
ment  problem  appears  possible  along  the  lines  of  that  used  for  the 
"effective”  stochastic  model  for  precipitation  (see  below) .  One 
could  assume  the  center  of  the  high  pressure  system  traveling  along 
a  path  at  a  given  rate,  the  various  descriptive  quantities  interven¬ 
ing  being  assumed  to  be  random  variables  with  given  p.d.f.'s.  In 
this  way  a  stochastic  model  could  be  built  up  and  if  required  the 
further  refinement  of  there  being  several  high  pressure  centers 
could  be  added.  1  had  a  preliminary  look  at  such  a  method  of 
approach  and  it  appears  entirely  possible. 

9 .  An  "Effective"  Stochastic  Model  for  Interpreting  the 
Correlation  Between  the  Precipitation  at  Two  Stations.  [ AFCRL-62-495] 
In  this  paper  I  was  concerned  with  interpreting  precipitation  data 
as  actually  measured  on  the  ground.  To  this  end  I  introduced  a 
concept,  which  although  not  known  to  me  before  must  have  been 
invented  many  times  previously,  that  of  the  "effective"  path  of  a 
rainstorm.  Suppose  any  number  of  stations,  spread  over  a  large 

• 

area,  all  measuring  the  precipitation  from  a  rainstorm  simultaneously 
at  certain  fixed  points  of  time.  At  any  given  time  point,  knowing 
the  geographical  positions  of  the  stations  and  the  amount  of  pre¬ 
cipitation  at  each  we  may  calculate  the  geographical  position  of  the 
center  of  gravity  of  the  precipitation  from  the  several  stations, 
and  we  may  do  this  for  each  time  point.  If  the  rainstorm  is  moving 
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then  the  center  of  gravity  of  measured  precipitation  will  be  dif¬ 
ferent  for  each  time  point ,  and  the  best  fitting  curve  to  these 
centers  of  gravity  will  give  what  I  have  called  the  effective  path 
of  the  rainstorm  and  will  enable  the  rate  at  which  the  storm  is 
moving  to  be  estimated.  I  have  assumed  the  effective  path  to  be 
linear  but  no  intrinsic  difficulty  is  introduced  if  any  other 
functional  form  is  envisaged.  It  would  be  desirable  for  numerical 
work  to  be  carried  out  to  investigate  the  actual  effective  paths  of 
rainstorms  and  their  relationship,  if  any,  with  the  low  pressure 
center  of  the  storm. 


v 


If  linearity  is  assumed  then  the  effective  path  Ep  of  the 
storm  can  be  described  by  two  quantities .  Consider  any  axes  of 
reference  (u,  v)  and  any  origin  of  coordinates.  Let  Ep  cross 
the  v  axis  at  a  distance  D  from  the  origin  of  coordinates  at 
an  angle  9  as  shown.  These  descriptive  quantities  G,  D  will  be 
different  for  different  storms  and  it  will  be  supposed  that  for 
all  storms 

P(D>  -  t  6g  <».  -D1  *  D  '  °S’ 


p(9)  -  sin  9  d9, 


0  <  9  <  7T. 
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The  center  of  gravity  of  the  storm  which  we  will  speak  of  as  the 
effective  center  may  for  any  particular  storm  be  supposed  to  be 
moving  at  a  constant  rate  of  r  miles  per  unit  of  time,  and  for 
all  storms  it  will  be  assumed  that 

p(r)  =  TJJ+TJ  rf  e"r  dr>  0  <  r  <  +oo. 

Now  let  us  consider  the  composition  of  the  storm.  We  build  this 
up  from  the  effect  on  the  ground  where  the  precipitation  will  be 
assumed  finally  to  arrive  as  drops  of  water.  Suppose  each  storm 
is  composed  of  a  number  of  showers.  Each  of  these  showers  will  be 
assumed  to  have  an  effective  center,  i.e.,  a  center  of  gravity  of 
the  drops  which  compose  it.  These  centers  of  gravity  (or  effective 
centers)  may  be  supposed  to  be  distributed  in  bivariate  normal 
fashion,  with  center  the  effective  storm  center,  the  minor  axis  of 
the  elliptic  contours  of  equal  density  being  the  effective  path  of 
the  storm,  and  the  standard  deviations  in  the  directions  of  the 
major  and  minor  axes  being  Z2  and  Zj^  Further,  given  any 
particular  effective  shower  center  the  drops  composing  this  shower 
are  assumed  to  have  a  spatial  bivariate  normal  distribution  with 
center  the  shower  center,  with  standard  deviations  along  the  major 
and  minor  axes  of  equal  density  ellipses  equal  to  62  and  6^  and 
with  minor  axis  parallel  to  the  effective  path  of  the  storm.  When 
drops  fall  on  the  ground  it  is  supposed  that  all  the  drops  composing 
one  shower  have  fallen  simultaneously  out  of  the  storm  and  that 
they  fall  independently  of  one  another.  Further,  any  shower  will 
be  assumed  to  fall  independently  of  any  other  shower.  We  have  thus 
set  up  a  "random"  mechanism  to  describe  the  observed  ground  effects. 


-  13  - 


It  remains  to  characterize  the  Intensity  of  the  storm.  We 
will  suppose  that  all  rain  drops  are  of  the  same  size  for  con¬ 
venience.  (There  is  no  intrinsic  difficulty  in  introducing  a  drop 
size  distribution  into  the  model.)  If  one  particular  shower,  say 
the  ith,  has  d^  drops,  we  will  let 

di 

p(d^)  =  ^  j  »  dj»0,l,2, • •  •  ,oo  . 

Tj  will  thus  represent  the  average  number  of  drops  in  a  shower  for 
a  given  storm.  If  the  probability  that  a  shower  falls  out  of  the 
given  storm  in  time  dt  is  Xdt,  and  not  more  than  one  shower  is 
allowed  to  fall  in  this  given  instant  of  time,  X  will  represent 
the  average  number  of  showers  which  fall  from  the  given  storm  in 
the  fixed  time.  X  and  q  will  therefore  together  represent  the 
intensity  of  the  given  storm.  We  shall  assume  X  and  ^  are 
independent  of  each  other  and  of  r  the  rate  at  which  the  effective 
storm  center  is  moving.  (An  obvious  generalization  of  the  model  is 
to  assume  that  they  are  all  correlated  in  some  way.)  X  and  q 
will  be  different  for  different  storms  and  in  the  absence  of  further 
practical  information  we  will  assume 

P(n)  -  3  Tj(A-rj)dr),  o  <  q  <  A, 

A 

p(x)  =  4  X(B-X)dX,  0  <  X  <  B, 

B^ 

where  A  and  B  are  two  positive  numbers  with,  it  may  reasonably 
be  supposed  if  required,  A  >  B. 

It  is  desired  to  compute  the  correlation  between  the  rainfall 
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at  two  stations.  This  can  be  done  for  a  particular  storm  or,  in 
line  with  the  data  the  writer  was  given  for  pressure  measurements, 
by  forming  a  bivariate  table  of  the  amounts  of  precipitation  at 
two  stations  for  a  given  period  of  a  year  over  a  number  of  years, 
and  from  this  table  computing  the  correlation.  The  model  given 
can  be  used  for  either  case.  Let  the  two  stations  be  and  W2 

p 

and  let  the  catchment  (or  target)  areas  ii.  each  be  of  area 
2 

and  IC;,  respectively.  If  the  geographical  coordinates  of  W^ 
and  Ug  are  (U^,  V^)  and  (U2,  Vg) ,  referred  to  the  given 
axes  of  coordinates,  with 

U2“U^  «  R^,  Vg-V^  =  ^2*  R2  *  R2  +  R^ ,  R2  <*  R^  sin  9  -  Rg  cos  0 

then  R  is  the  distance  between  the  two  stations.  The  required 
correlation  is 


When  R  is  large,  J  will  not  be  very  different  from  zero,  and 
the  correlation  between  the  precipitation  at  the  two  stations  will 
be  negative.  When  R  is  small  there  will  be  a  positive  correlation. 
An  explicit  expression  for  does  not  appear  possible  except  in 
special  cases  but  two  expansions  can  be  evolved  which  will  be 
adequate . 
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a)  R  small  and  <  26^, 

It  was  assumed  that  0^  was  smaller  than  62  and  this 
expression  will  be  useful  only  if  R  <  2fi^,  which  will  imply 
that  the  stations  are  fairly  close  together.  We  have 


J-af 


V  l-z^ 


Let 


so  that 


1  R2fl  1 

~~5~  “  T1  “ 

S  Or 


7? 

2- 


J  -  a[e*p{-  t  «•{-  S)d* 


or 


J  ,  ,[exp(-  £j]  (x  +  ^ 


b)  R  large . 
Write 


G  -  1  -  expj-  4] 


Then 


Jm  2M-  ^  [sin^Cs-G1/2)  +  s.G3/2[-  &  +  §(£-  &) 

+  t(&  ®4  +  5,2  "  t!h) 


+  t(§?  *6  +  +  ^ s?  -  35t5c)  +“*}]' 


These  two  expressions  will  be  sufficient  to  cover  the  range  of 
possible  values.  Given  the  values  of  the  several  constants 
entering  into  the  formulae  the  value  of  the  correlation  between 
the  precipitations  at  two  stations  can  accordingly  be  predicted. 

The  formulae  were  rewritten  to  take  account  of  situations  where 
one  target  area  overlapped  the  other  or  alternatively  one  was 
contained  within  the  other.  It  will  be  noted  that  if  one  station 
is  taken  as  center  it  is  possible,  by  varying  R  to  draw 
geographical  contours  on  which  the  correlation  is  constant,  in 
other  words,  to  draw  a  correlation  surface  for  precipitation  data 
such  as  was  envisaged  for  pressure  data. 

10.  Practical  Considerations  Concerning  the  "Effective11 
Stochastic  Model  for  Precipitation.  Access  was  not  available  to 
sufficient  data  to  enable  a  complete  trial  of  the  propriety  of  the 
model  suggested  to  be  made.  The  first  and  most  important  point  is 
to  investigate  what  was  called  the  effective  path  of  the  storm. 

I  was  not  able  to  do  this.  I  did  however  have  made  available  to  me 
for  California  only,  some  paths  of  low  pressure  centers  which  were 
calculated  for  the  Santa  Barbara  weather  project.  These  data  were 
rather  scanty  but  from  them  it  was  possible  to  decide  that  assump¬ 
tions  of  rectangularity  for  the  p.d.f.  of  D,  of  sin  6  for  the 
p.d.f.  of  0,  and  of  a  gamma  distribution  for  r,  were  not  un¬ 
reasonable.  It  would  seem  that  numerical  values  for  and  6g 

should  not  be  difficult  to  obtain  provided  it  is  possible  to  measure 
the  precipitation  over  a  short  enough  period  of  time,  say  5  or  10 
minutes,  for  a  group  of  stations,  and  will  undoubtedly 

be  variable  in  practice,  but  an  average  value  of  these  will  probably 
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be  good  enough.  If  they  are  found  to  be  of  great  variability  then 
the  model  will  need  to  be  extended  to  take  account  of  this.  There 
remains  to  find  estimates  of  B  and  of  AB.  It  is  possible  to 
obtain  these  either  from  the  raw  data,  or  which  is  possibly  less 
complicated,  to  take  two  pairs  of  stations  the  distances  between 
the  elements  of  each  pair  being  small  but  the  distance  between  the 
pair  being  moderately  large.  If  the  actual  correlations  between 
observed  data  are  now  calculated,  and  the  formula  given  here  for 
the  correlation  is  used,  one  may  back  solve  to  get  values  for  B 
and  for  AB  and  consequently  for  A. 

11 •  Further  Possible  Work  Following  from  the  "Effective"  Model. 
The  acid  test  of  any  model  is  how  well  it  accords  with  the  data  of 
experience  and  until  this  actual  numerical  data  is  analyzed  with 
the  objective  of  testing  the  several  assumptions  which  are  made, 
there  is  little  point  in  further  generalization  of  the  model.  Modi¬ 
fications  to  accord  with  numerical  results  will  not  be  difficult 
to  make  once  it  is  known  what  is  required.  One  very  obvious  gen¬ 
eralization  which  may  be  necessary--as  stated  immediately  above-- 
is  to  allow  for  variability  in  the  scale  parameters  of  the  "drop" 
distribution.  This  may  be  done  fairly  simply  by  assuming  for  each 
of  the  p.d.f.'s  of  6^  and  02  an  inverse  gamma  (Pearson  Type  V) 
distribution  but  whether  the  resulting  integrations  can  then  be 
performed  is  something  which  has  yet  to  be  investigated. 

12.  Persistence  in  a  Chain  of  Multiple  Events  when  there  is 
Simple  Dependence.  [AFCRL-62-496]  During  one  of  my  visits  to 
Mr.  Durst  we  discussed  the  problem  of  persistence  in  weather  and 
he  drew  my  attention  to  several  papers  in  the  Proceedings  of  the 
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Royal  Society  In  which  this  problem  was  treated.  I  studied  these 
papers  but  they  seemed  old  fashioned  from  a  statistical  point  of 
view  and  I  decided  to  see  what  could  be  done  using  a  more  modern 
method  of  attack.  The  problem  is  in  essence  a  relatively  simple 
one.  Given  a  series  of  observations  which  are  consecutive  in 
operational  time,  it  is  required  to  describe  the  correlation  between 
any  pair  of  them.  As  research  developed  on  this  problem  it  was 
found  more  rewarding  to  work  with  a  persistence  factor,  6,  which 
actually  attempted  to  describe  the  dependence  of  each  observation 
on  the  one  which  precedes  it,  and  which  is  connected  with  the 
correlation  in  a  very  simple  way. 

Suppose  a  sequence  of  n  consecutive  intervals  of  operational 
time;  during  each  interval  one  only  of  s  mutually  exclusive 
events  (i*l,2, • • • , s)  must  happen.  Thus  for  example  we  might 
describe  rainfal  in  the  terms  heavy  (E^ ,  medium  (Eg) ,  light  (E^) , 
trace  (E^)  and  none  (E^) .  These  would  be  five  mutually  exclusive 
possibilities.  The  operational  time  will  be  interpreted  according 
to  what  we  are  studying.  It  might  be  consecutive  actual  time 
intervals  (hours,  minutes,  days)  at  the  same  weather  station  or 
it  could  be  distances  with  n  weather  stations  instead  or  a 
combination  of  the  two  or  anything  else  in  which  we  are  interested. 
The  chain  of  events  is  assumed  to  have  reached  the  equilibrium 
state  so  that 

s 

P(E^)  =  p^,  i«l,2,,,*,s  and  Z  p^  -  1. 

i«*l 

If  there  is  no  dependence  between  events,  i.e.,  if  for  example  the 
direction  from  which  the  wind  blew  at  10  a.m.  was  uncorrelated 
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with  the  direction  from  which  the  wind  blew  at  9  a.m.,  then 

P  C  E±  I  =  PfEjjEj},  1*1,2, j*l,2,***,s. 

We  Imagine  that  there  will  be  a  correlation  of  some  kind,  however, 
and  allow  for  it  in  the  following  way.  Since  n  consecutive  time 
intervals  are  supposed,  we  have  a  chain  of  n  consecutive  events. 

At  any  point  in  the  chain  if  we  take  (say)  the  kth  and  k+l-st 
positions,  we  write 

P(E^  in  the  (k+l)st  position  given  E^  in  the  kth  position) 

*  <pp^  +  0;  <p  *  1  -  9. 

P(Ej  in  the  (k+l)st  position  given  E^  in  the  kth  position) 

*  <PPj>  J  t  i> 

for  all  i,j*l,2, • • *,s.  This  means  that  we  assume  the  persistence 
factor,  6,  is  the  same  for  all  pairs  of  like  events.  Now  0  is 
some  parameter  and  it  is  usually  necessary  to  estimate  it  from  the 
data.  It  is  shown  that  T,  the  number  of  transitions,  from  one 
event  to  a  different  event,  in  the  sequence  is  a  sufficient  estimator 
for  0  if  *  1/s,  i-l,2,***,s,  and  it  is  accordingly  suggested 
that  an  adequate  estimator  for  0  when  the  probabilities,  p^ 
are  unequal  will  be 


where 
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We  have  also 

var  T  -  (n-l)<p[(l-P2)(P2cp+©)  +  2(P3-p|)]  -  2(P3-p|)(l-0n'1) 

(from  which  var  9  is  immediate),  with 

3  3 

P3  =  ^  Pi. 

A# 

0  is  asymptotically  normally  distributed.  Given  the  probabilities 
(p^),  therefore,  it  is  possible  to  test  a  hypothesis  about  0. 
Further,  it  is  suggested  that  if  there  are  two  sequences,  the  first 
yielding  an  estimate  (with  associated  probabilities  (p^)) 

and  the  second  yielding  an  estimate  ©2  (with  associated  proba> 
bilities  f^}),  then  a  test  criterion  for  the  equivalence  of  the 
persistences  in  the  sequences  might  be 


with 


var  0d  =  var  0^  +  var  02 


•v 

and  9d  normally  distributed.  It  will  be  necessary  to  use  an 
estimate  of  0  in  var  0d  in  order  to  carry  out  the  test  of 
significance  and  the  estimate 

01(n1-l)(l-P2)  +  02(n2-l  )(!-«#>) 


9  = 


(nrl)(l-P2)  +  (n2-l)(l-^>2) 


where  n^  and  ng  are  the  numbers  in  the  first  and  second  se¬ 
quences,  respectively,  and 

n, 

Po  = 


1  2 
2  P7> 
i=l  1 


% 


n2 

2 

i*l  rv 
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As  an  example  of  how  to  test  a  hypothesis  about  9  (or  cp  -  1  -  0) 
wind  data  from  Batavia  for  January,  1933  was  given.  Using  past 
records  the  probabilities  of  the  various  wind  directions  in  January 
at  Batavia  may  be  taken  to  be: 

Direction.  S  SEN  NEN  NWW  SWC 
Probability.  0.02  0,01  0.04  0.05  0.15  0.41  0.30  0.01  0.01 

where  C  stands  for  calm.  These  are  the  (p^ 3 .  The  31  actual  wind 
directions  for  January,  1933  were 


W,  NE,  SW,  N,  N,  N,  NW,  W,  N,  NW,  NW,  N,  NW,  W,  N,  N,  N, 


N,  NW,  N,  W,  W,  W,  W,  W,  E,  NW,  W, 


The  number  of  transitions  T  *=  19  which  gives  an  estimate 

0  =  0.114 


with 

var  9  -  qp(0.05964)  -  q>2(0. 03333)  -  0.000433 

if  the  term  in  9JW  is  neglected.  To  test  the  hypothesis  0  =  0Q 
(or  cp  «  cpq  =  1  -  0q)  we  calculate  the  criterion 


and  refer  to  normal  tables.  Thus  if  0^  n  0.1  we  have 


var  9  =  0.0262, 


£5(9)  -  0.162 
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and  the  test  criterion  is 


0.014 


052 


0.09 


which  is  clearly  not  significant. 

A  preliminary  numerical  investigation  into  wind  records  with 
an  estimate  of  6  calculated  for  each  month  of  the  year  showed 
that  the  values  of  0  were  approximately  cyclical,  which  might  be 
a  reasonable  thing  to  expect.  It  is  not  possible  however  to  say 
whether  or  not  this  effect  is  a  real  one  without  considerable  more 
numerical  analysis  with  possible  modification  of  the  mathematical 
model  proposed. 

13.  Further  Research  Work  Possible  on  Persistehce.  The  writer 
was  informed  that  the  work  on  persistence  fell  only  marginally 
within  the  scope  of  the  contract  and  consequently  further  work  on 
the  topic  was  abandoned.  Since  the  termination  of  the  contract 
I  have  taken  up  the  research  again  and  have  obtained  some  results 
which  will  possibly  eventually  appear  in  a  scientific  journal. 
Briefly  the  further  work  is  as  follows:  To  fix  ideas  let  us  think 
of  persistence  in  wind  direction  with  the  compass  as  a  clock  face. 
In  the  model  proposed  in  AFCRL-62-496  we  wrote 


pji  "  p^Ej  I  Ei ^  -  <PPj,  i“l/2,  •  •  •  ,s  but  i  ^  j  . 

This  means  that  the  weights  used  for  the  probabilities  are  the 
same  and  equal  to  cp,  and  do  not  depend  on  the  fact  that  the 
states  and  Ej  may  be  far  apart.  Thus  for  example  if  the 

state  Ejl  is  equivalent  to  a  North  direction  of  the  wind,  then 
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the  probability  of  a  wind  direction  NE  given  a  previous  North 
direction  will  be  cp  times  the  probability  of  a  NE  direction, 
and  the  probability  of  a  S  wind  direction  given  a  previous  N 
direction  will  be  cp  times  the  probability  of  a  S  direction. 

It  would  appear  more  realistic  to  introduce  a  system  of  weights  for 
the  transitional  probabilities  dependent  not  only  on  cp  but  on  the 
angular  displacement  of  the  wind  direction  assigning  a  bigger  weight 
(say)  to  the  transition  from  N  to  NE  than  to  the  transition  from 
N  to  S,  and  this  I  have  done.  Whether  such  a  model  will  be 
appropriate  for  analyzing  the  data  of  experience  can  only  be  tested 
from  actual  observational  material. 

14.  General  Remarks.  The  idea  of  building  a  stochastic  model 
to  describe  the  observations  made  on  a  particular  phenomenon  is 
obvious  enough  although  I  have  not  seen  anywhere  the  approach  to 
model  building  used  here  and  came  to  it  by  myself.  It  has  however 
probably  been  used  in  different  guises  many  times  before.  Granted 

that  the  actual  dynamics  of  the  various  aspects  of  the  weather _ 

rain,  pressure,  wind,  etc. --are  difficult,  the  approach,  backward 
as  it  were,  from  what  is  actually  observed  may  be  helpful  in  offer¬ 
ing  a  guide  to  the  dynamic  investigations.  As  the  writer  sees  it 
one  should  build  a  tentative  model,  checking  the  validity  of  the 
model  at  each  stage  from  observational  data.  Once  the  plausibility 
of  the  model  as  descriptive  of  the  observed  phenomenon  is  estab¬ 
lished,  the  mathematical  consequences  of  the  model  can  be  worked 
out.  There  appears  to  be  a  large  field  of  research  in  investigat¬ 
ing  the  general  models  which  could  be  proposed. 


